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A word cloud is a special visualization of text in which the more frequently 
used words are effectively highlighted by occupying more prominence in 
the representation. We have used Wordle to produce word-cloud analyses 
of the spoken and written responses of informants in two research projects. 
The product demonstrates a fast and visually rich way to enable 
researchers to have some basic understanding of the data at hand. Word 
clouds can be a useful tool for preliminary analysis and for validation of 
previous findings. However, Wordle is an adjunct tool and we do not 
recommend that this method be used as a stand-alone research tool 
comparable to traditional content analysis methods. Key Words: Wordle, 
Research Tool, Word Clouds, and Qualitative Research 


Introduction 

The potential of word clouds as a research tool 

In recent years, a number of social bookmarking tools have been developed. 
These tools are “public link management applications on the Web” (Hammond, Hannay, 
Lund, & Scott, 2005) that enable users to “tag” sites and cluster similar sites together. In 
essence, users can build personal libraries on the Web. These libraries are represented as 
word clouds. A word cloud is a special visualization of text in which the more frequently 
used words are effectively highlighted by occupying more prominence in the 
representation. Grammatical words and non-frequent words are hidden so that the 
resultant representation cleanly shows the most frequently occurring words of importance. 

The early social bookmarking tools were designed to manage and organize URLs. 
This study focuses on word clouds developed to analyze discrete pieces of text. A number 
of programs are available for generating these word clouds: TagCrowd (Steinbock, 2008; 
Sinclair & Cardew-Hall, 2008), MakeCloud (MakeCloud, 2008), ToCloud (ToCloud, 
2007) and Wordle (Feinberg, 2009). These programs are quick and automatic. Among 
them, Wordle may be the most versatile software to use. Users employ a user-friendly 
web-based interface to change the font, colour, and direction of words in a Wordle word 
cloud. Wordle outputs are regarded as “more personal and visual than the others” 
(Ramsden & Bate, 2008, p. 6) when compared to similar tools such as TagCrowd, 
MakeCloud and ToCloud. Wordle is described by its developer, Jonathan Feinberg, as a 
“toy” (Feinberg, 2009). We were intrigued by this comment and decided to see if the tool 
could be a scholarly toy of worth to the academic community. 

Word clouds reveal the frequencies of the different words that appear in a piece of 
text. To a certain extent, an understanding of the general composition of the frequently 
used words allows viewers to have an overview of the main topics and the main themes 
in a text, and may illustrate the main standpoints held by the writer of the text. 
Comparison of the word clouds generated from different texts should quickly reveal the 
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differences between the ideas contained in these texts. In this sense, we wanted to see if 
the word-cloud strategy could be a potentially useful method for qualitative analysis of 
text. 

There are some examples of word clouds being used analytically. Clement, 
Plaisant, and Vuillemot (2008), for example, used Wordle to generate word clouds in a 
literary study to compare and contrast the styles of writing in The Making of Americans 
(Stein, 1995, originally written between 1906 and 1911) to those in a set of 19 lh century 
novels written by Jane Austen, Charles Dickens, George Eliot, and George Meredith. The 
word clouds clearly demonstrated that the use of “one” (mostly as a pronoun) is very 
prominent in The Making of Americans. They observed that the frequent use of this word, 
“accomplished by the word’s schizophrenic nature” (p. 1), contributed to the sense of 
“confusion” being developed in the work. 

Apart from literary work, word clouds have also been used to study public 
speeches. Dann (2008) used TagCrowd to analyze the 2008 Federal Budget speech of 
Australia. Dann remarked that the word cloud served well as a preliminary analysis. The 
representation allowed the researcher to see “the level of self reference to the incumbent 
government and the presence of relationship marketing keywords” (p. 14) which then led 
the researcher to carry out further explorations. 

Word clouds can also be useful in education. Ramsden and Bate (2008) suggested 
that word clouds can assist in analyzing the survey responses as teachers can “have a 
visual depiction of the responses within a minute” (p. 2). 

In this paper, we would like to elaborate on the possible uses of word clouds in 
educational research by demonstrating the use of Wordle in two of our research projects. 
Wordle seems to be particularly useful for studies that involve qualitative/thematic 
analyses of written or transcribed spoken text. Specifically, we would like to demonstrate 
that Wordle can be used as: 

• A tool for preliminary analysis, quickly highlighting main 
differences and possible points of interest, thus providing a 
direction for detailed analyses in following stages; and 

• A validation tool to further confirm findings and interpretations of 
findings. The word clouds thus provide an additional support for 
other analytic tools. 


Methodology 

Using Wordle to carry out preliminary analyses 

We looked at the use of Wordle as a research tool for preliminary analysis of 
focus-group transcripts. It was the beginning of a research project which was conducted 
to study the dynamics in focus-group meetings and to identify human factors (especially 
the interactions between the facilitators and the participants) that might affect the 
comments made by participants. Because of the known limitations of Wordle, which will 
be elaborated in the final section of the paper, the analysis was intended to be a simple 
strategy for us to obtain a quick but brief overview of the data. In this sense, it was a 
supplementary but not a main analytic tool. 
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We analyzed the transcriptions of six focus-group meetings. Each meeting was 
about one hour long and involved one or two facilitators talking with a group. Student 
numbers in each group ranged between six and 14. The meetings involved two quite 
distinct projects. In both projects, students were interviewed about their educational 
experiences; the focus-group meetings were included in the project proposals which were 
vetted for ethical approval by The Chinese University of Hong Kong. All students were 
volunteers. 

The first three meetings were evaluations of a project called “Science Enrichment 
Programme for Secondary three to four (K9 - K10) Students.” It was a two-year 
enrichment programme for secondary school students gifted in science who attended 
workshops and completed projects at The Chinese University of Hong Kong. The goal of 
the project was to offer young gifted students greater opportunities to explore and 
develop their higher-order thinking skills, creativity, and personal-social competencies 
with a view to heightening their potential. The project employed a number of evaluation 
strategies to support reflection on the achievement of the objectives at various stages of 
the project. Apart from administration of surveys, we also met the students in focus-group 
settings at various stages in the programme. There were three focus-group meetings held 
at the end of the whole programme (2008), in which three groups of students (the 
Mathematics, Physics, and Biology streams) discussed their perceptions of various 
aspects of the programme. Transcripts were made of the three meeting tapes. 

The second set of focus-group meetings came from a study in which the research 
team met with groups of Year one and Year two students in the Faculty of Law at The 
Chinese University of Hong Kong. In 2006-07, the Faculty wished to find out students’ 
opinions on the soft-skill courses (e.g., legal research, writing, and information literacy) 
that had been newly introduced to the programme. The students were asked to fill in a 
questionnaire to comment on how much they could understand and apply the legal 
knowledge and skills they learnt from the courses. As it was thought that students could 
elaborate on the points they wanted to make in face-to-face discussions, students were 
also invited to attend focus-group meetings in April 2008. Transcriptions were made of 
three of these meetings. Two of the meetings involved Year one students and one 
involved Year two students. Below is a short extract from one of the transcripts. The 
Wordle diagram from this complete interview is shown in Figure 4. The bolded words 
below can be seen in Figure 4. 


Facilitator: 

S3: 

S5: 

S2: 


S5: 


Well, would it be possible to cut out these contents? 

I believe the professors would strongly oppose this idea. 

I don’t think it is possible to cut out all these. 

The content was not bad. I just think that it would be better to add in 
some local elements. Then we can see these are also applicable to the 
cases in Hong Kong. We will be happier if that can be done. 

These can’t be cut out completely. It is because they are about the origin 
or tradition of the law system. It is fine to tell us about the case in UK, 
but it is good to mention the situation in Hong Kong as well. Then we can 
see how such system has been implemented in Hong Kong. Better 
integration of the course contents can then be achieved. 
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A study of the transcriptions in full would include time-consuming and detailed 
coding of the types of interactions and comments voiced in the six meetings. Before 
starting a long analytic process, we, the researchers, thought that Wordle might be used to 
provide a quick outline of the data. Each of the six transcriptions was fed into the Wordle 
system, resulting in six word clouds as output. We wanted to see whether the preliminary 
Wordle analysis could better inform the follow-up analysis. 

Methodology: Using Wordle to validate existing findings 

We also used Wordle as a research tool to validate our previous analysis in a 
study of students’ opinions about the use of eBooks also at The Chinese University of 
Hong Kong. The project carried out a series of investigations into the usability and 
acceptability of eBooks. The entire project had a number of phases. In an earlier phase, 
we investigated students’ first impressions of the technology after they had had a brief 
introduction to eBooks. 

In the third phase of the project, the study focused on acceptability (or the actual 
likelihood of future use). We wanted to answer questions such as “In what ways do 
students actually use eBooks? (when, where, and with what devices?)” and “What are the 
factors that influence whether users will continue to use eBooks?” 

To answer these questions, the project contained a range of user-feedback 
sessions, reading sessions, and extended reading periods in which students read the 
eBooks in naturalistic settings for 12 weeks. Instruments used included questionnaires, 
interviews, focus-group meetings, video recordings of user actions, and online blogs 
where students commented on their eBook reading habits. Our contacts with the students 
(reported in Lam, Lam, Lam, & McNaught, 2009) generally affirmed that the technology 
has potential to enhance teaching and learning in a university setting. However, the 
experiences (especially of the long-term users) highlighted a number of challenges that 
need to be addressed. 

Six students participated in the extended reading study. Regularly throughout the 
extended reading period, each of the students kept an individual online diary in which 
they freely commented on what they liked and disliked about the eBook reading so far. 
Each of the students wrote about 1,000 to 2,000 words of online journal entries by the 
end of the period. Below is a sample extract of the blogs. 

I was reading a difficult chapter which I had to review back the pages that I read 
before to understand the chapter. Since it is not possible to read two different pages 
at the same time for ebook, it was difficult for me to refer back to the previous 
content while I am reading on the current page. At this point, book of hardcopy is 
definitely better than ebook. Besides, by rotating the screen for 90 degree, more 
words can be showed in each line, as the length of each line is wider. I found it is 
more comfortable to read in this way. 

We used Wordle to analyze the blog entries of five of the six students (one student 
wrote in Chinese and Wordle cannot handle Chinese characters). The text of their 
journals was fed into Wordle individually and five word clouds resulted. The extract 
above was part of the text used for Ligure 11. The bolded words can be seen in the figure. 



634 


The Qualitative Report May 2010 


Findings 

Using Wordle to carry out preliminary analyses 

Figures 1 to 3 are the word clouds of the three focus-group transcripts for the 
science enrichment project. The word clouds of the Faculty of Law study are presented in 
Figures 4 to 6. To comply with the notion that Wordle is used for fast analysis, minimal 
touchups were done to these word clouds. They are largely the first output of the raw text 
we submitted to the online Wordle system. 

Figure 1. Word cloud of the students discussing Physics enrichment activities. 
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Figure 2. Word cloud of the students discussing Mathematics enrichment activities. 
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Figure 3. Word cloud of the students discussing Biology enrichment activities. 
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Figure 4. Word cloud of Law Year one, group one, students discussing “soft skills” 
courses. 
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Figure 5. Word cloud of Law Year one students, group two, discussing “soft skills” 
courses. 
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Figure 6. Word cloud of Law Year two students discussing “soft skills” courses. 


like 


__ semester * B I A n 

B S5COLjrsG~think§7 * 

mean 

I practical several 


ratio 

example 


Cl really 
O I read 



read 

enough 


Also knowledge 

articles 


Laughing 

ELT 


clear smdy However 

long 


topic important 


■■ u o means a 

work coursejOne caS g different 


With even a cursory look at the six figures one can tell that the science meetings ran very 
differently from the law meetings. Apart from the fact that the two sets of focus-group 
meetings were about very different topics and hence the key words shown in the word 
clouds are largely different, the two set of word clouds have the following differences 
that indicate that the dynamics of the discussions in the two sets of meetings also varied a 
great deal: 

• The facilitators had a much more prominent role in the science 
enrichment meetings. 

• The student participation in the science meetings was low. Many 
students did not appear to have participated. 

• Fewer actual comments were received in the science meetings 
compared with the law meetings. From the Wordle analysis, we 
see that the number of words that appear to refer to actual 
discussions and comments is considerably less in Figures 1 to 3 
compared with Figures 4 to 6. 

The word clouds suggest that the focus-group meetings for the science enrichment 
programme ran less successfully than those in the Faculty of Law. These observations 
were confirmed by the reflections of the facilitators in the meetings. Much of the time 
was occupied by the facilitators in the science meetings and the students were largely 
unwilling to talk. Even when they talked, they gave simple replies rather than elaborated 
answers. In contrast, the dynamics in the Faculty of Law meetings were much better. 
Facilitators still took up a great deal of time but many students were willing and able to 
participate as well. For example, Figure 6 shows that, in the Year two group, nearly all 
the students talked, sharing approximately the same amount of time as the facilitator. The 
richness of the content words in these three figures generally indicates that many ideas 
were brought up and discussed in the meetings. 

In this study, the word clouds effectively gave the research team a fast and 
preliminary understanding of what was happening in each of the six meetings. In so doing, 
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they directed the researchers’ attention to differences in group dynamics in these 
meetings. Follow-up investigation will be done to look at the many factors that might 
have affected the focus-group performance. Factors include both the composition of the 
groups and the strategies used by both the facilitators and the participants. For example, 
we will question whether younger participants (those in the science meetings were around 
15 years old while students in the law school were 19 or older) are less likely willing to 
speak up. We will consider whether science students are, on the whole, less expressive 
than law students. We will also look at the questions and follow-up questions/statements 
used by the facilitators to see whether some strategies are more likely to lead to student 
involvement. 


Using W or die to validate existing findings 

Figures 7 to 1 1 are the word clouds of the student blogs recorded in the eBook 
project. As in the first study, these word clouds are rapid output generated by putting the 
text of the blogs into the online Wordle system. Since the actual themes and points 
mentioned in the discussions were of interest in this study, one additional step was taken 
to clarify some of the ambiguity of the Wordle graphics. As the units of representation in 
these word clouds are words, instances of use such as “not convenient” will add to the 
frequency count of the word “convenient”, a sense that is opposite to the original 
meaning in text. Thus an additional step was done in this study to each of the five blogs 
to delete the space after all “not” words so that “not convenient” becomes a one-word 
“notconvenient”. In this way, the word cloud output would count the frequency of 
“notconvenient” instead of treating it as “convenient” and the negative sense could be 
preserved. The changes to text were done relatively quickly using the automatic replace 
function in the word processor. The general goal to use Wordle as a quick research 
strategy was still maintained. 


Figure 7. Word cloud of the blog written by Student one in the eBooks project. 
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Figure 8. Word cloud of the blog written by Student two in the eBooks project. 
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Figure 9. Word cloud of the blog written by Student three in the eBooks project. 
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Figure 10. Word cloud of the blog written by Student four in the eBooks project. 
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Figure 11. Word cloud of the blog written by Student five in the eBooks project. 
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As reported in Lam et al. (2009), the students who participated in the extended 
reading activity answered a simple question about whether they would use eBooks for 
learning in the future on two occasions. They first gave the researcher their opinions right 
after they were introduced to the technology and had had some hands-on experience with 
reading and using eBooks. Also, at the end of the 12-week reading period, in which they 
were asked to roughly spend about four hours a week using eBooks to learn, they were 
asked the same question again. The results of their replies are in Figure 12. 
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Figure 12. Changes in students’ perception about the value of eBooks (Lam et al., 2009, 
P- 41) 



We would argue that the word clouds and the students’ replies shown in Figure 12 
represent similar findings. 

Student one (Figure 7) used quite a number of words loaded with negative 
meanings in her blog such as “inconvenient,” “troublesome,” “troubles,” “difficult,” and 
“frustrated”. We were able to see that eBook reading did not seem to be a very enjoyable 
experience for the student. Student 1 told the researchers at the end of the extended 
reading period that it is “unlikely” that she would use eBooks by choice. 

Student two (Figure 8) was also relatively negative about the experience. The 
words “boring,” “difficult,” “renewing,” and “battery” are seen in the word clouds. It 
appears that the student found the battery life short and had to renew (recharge) the 
battery more frequently than desired. Student two told the researchers that he would 
“definitely not” use eBooks as a real learning tool. 

Student three (Figure 9) had both positive and negative feelings towards eBooks. 
On the word cloud we see words such as “inconvenient” and “impossible”. However, he 
also wrote quite a number of times about “wonderful”, “enjoy” and “interesting.” In the 
survey, Student three remarked that he was neutral about whether he would continue to 
read eBooks for study. 

Student four (Figure 10) had few negative feelings, judged by few words with 
negative loadings in the word cloud: a limited use of “difficult” and “failed” was all we 
could find. In fact the reading activity did not seem to arouse much emotion in this 
student at all. This student spent his time writing rather objectively about the functions 
used and the time spent on the reading tasks. At the end of the reading, the student said 
that he would “definitely” use the technology again. 
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Student five (Figure 10) wrote quite negatively on the blog. There were frequent 
mentions of words like “troublesome,” “inconvenient,” “difficult,” and “battery” in the 
word cloud. In contrast, few words are found that can be regarded as clearly positive 
(“comfortable” may be the only one). Student five remarked that it was “unlikely” for 
him to use the eBook strategy for studying at both times before and after the extended 
reading period. 

We see that the word clouds not only roughly validated the findings we obtained 
from another source (survey), they also quickly revealed to us some underlying reasons 
for students’ like or dislike of this eLearning strategy. For example, the Wordle analysis 
suggested to us that if the students found the technology difficult and inconvenient to use, 
it is not likely that they would use it in their study. Also, a long battery life may be also of 
great importance when we are talking about mobile learning strategies. 


Conclusion 


Our two experiences in using Wordle to inform research have led us to suggest 
that word clouds can be a useful research tool to aid educational research. We have 
demonstrated that they can allow researchers to quickly visualize some general patterns 
in text. In the research setting, these texts are likely to be informants’ spoken (transcribed) 
and written responses. The visualization allow researchers to grasp the common themes 
in the text, and sometimes even to find out main differences between sets of responses. 

As research tools, however, word clouds have certain limitations and we need to 
be well aware of them. First of all, as frequency is an important aspect of the tool, we 
would argue that the strategy works best for analyzing text in which the full text of each 
informant’s speech is preserved. In other words, it is less meaningful to input researchers’ 
minutes or summaries of a focus-group meeting into the system as the frequencies of the 
words used will be changed. Rather, transcription of the actual discussions should be used. 
Similarly, in the written format, it is best to analyze the raw written responses provided 
by informants rather than the second-level summaries or reports compiled by the 
researchers. 

Another limitation of word clouds is that the words are retrieved out of context. It 
is also not possible for users to trace the codes back to the original text. The frequent 
mention of words such as “screen” and “battery” in our second study is insufficient 
information for researchers to know the exact opinions concerning the screen and battery. 

Word clouds treat each word as the unit of analysis. This mechanical 
manipulation of text is fast but at the same time it can be misleading because it neglects 
the semantics of the words and also the phrases and even sentences the words are 
composed of. As noted above, the treatment will fail to treat “not convenient” as a 
meaningful phrase in itself. Even if we do what we did in the second study and combine 
meaningful phrases into joint-words (e.g., “notconvenient”), ambiguity cannot be 
completely avoided. For example, the system will not be able to reflect the negative sense 
of a saying like “I wish the software could be more convenient.” Because of this 
simplistic treatment of word forms rather than the actual meanings they carry, the 
strategy outlined in this paper is not recommended as a stand-alone research tool 
comparable to traditional content analysis methods. 
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However, overall, as shown in the two studies reported in this paper, word clouds 
can be a useful tool for preliminary analysis and for validation of previous findings. 
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